A note on security
-
Although pandoc itself will not create or modify any files other than those you explicitly ask it create (with the exception of temporary files used in producing PDFs), a filter or custom writer could in principle do anything on your file system. Please audit filters and custom writers very carefully before using them.
-
Several input formats (including LaTeX, Org, RST, and Typst) support
include
directives that allow the contents of a file to be included in the output. An untrusted attacker could use these to view the contents of files on the file system. (Using the--sandbox
option can protect against this threat.) -
Several output formats (including RTF, FB2, HTML with
--self-contained
, EPUB, Docx, and ODT) will embed encoded or raw images into the output file. An untrusted attacker could exploit this to view the contents of non-image files on the file system. (Using the--sandbox
option can protect against this threat, but will also prevent including images in these formats.) -
In reading HTML files, pandoc will attempt to include the contents of
iframe
elements by fetching content from the local file or URL specified bysrc
. If untrusted HTML is processed on a server, this has the potential to reveal anything readable by the process running the server. Using the-f html+raw_html
will mitigate this threat by causing the wholeiframe
to be parsed as a raw HTML block. Using `–sandbox will also protect against the threat. -
If your application uses pandoc as a Haskell library (rather than shelling out to the executable), it is possible to use it in a mode that fully isolates pandoc from your file system, by running the pandoc operations in the
PandocPure
monad. See the document Using the pandoc API for more details. (This corresponds to the use of the--sandbox
option on the command line.) -
Pandoc’s parsers can exhibit pathological performance on some corner cases. It is wise to put any pandoc operations under a timeout, to avoid DOS attacks that exploit these issues. If you are using the pandoc executable, you can add the command line options
+RTS -M512M -RTS
(for example) to limit the heap size to 512MB. Note that thecommonmark
parser (includingcommonmark_x
andgfm
) is much less vulnerable to pathological performance than themarkdown
parser, so it is a better choice when processing untrusted input. -
The HTML generated by pandoc is not guaranteed to be safe. If
raw_html
is enabled for the Markdown input, users can inject arbitrary HTML. Even ifraw_html
is disabled, users can include dangerous content in URLs and attributes. To be safe, you should run all HTML generated from untrusted user input through an HTML sanitizer.